CLAIMS 

What is claimed is: 

1. A computer-implemented method for characterizing an interrogation position in a 
nucleic acid segment, comprising: / ^ 

5 inputting into a computer system a first measure of relative allele frequency at the 

interrogation position in the nucleic acid segment derived from a first sample collected from a 
first group of n individuals, wherein n is an integer equal to or larger than 2; 

inputting into the computer system a second measure of relative allele frequency at 
the interrogation position in the nucleic acid segment derived from a second sample collected 
10 from a second group of m individuals, wherein m is an integer equal to or larger than 2; and 
analyzing in the computer system the first measure and the second measure to 
characterize the interrogation position. 

2. The method of claim 1 , wherein the first group is a case group and the second group is 
15 a control group. 

3. The method of claim 2, wherein the individuals in the first and second groups are 
animals. 

20 4. The method of claim 2, wherein the individuals in the first and second groups are 
mammals. 

5. The method of claim 2, wherein the individuals in the first and second groups are 
humans. 

25 

6. The method of claim 2, wherein the individuals in the case group are humans who are 
selected based on a phenotypic characteristic of interest; the individuals in the control group 
are humans who are selected based on lack of the phenotypic characteristic of interest; and 
the step of analyzing includes analyzing in the computer system the first measure and the 

30 second measure to characterize the interrogation position as being associated with the 
phenotypic characteristic of interest. 
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7. The method of claim 6, wherein the phenotypic characteristic of interest is 
susceptibility or resistance to a disease, a disorder, or infection of a pathogen. 

8. The method of claim 6, wherein the phenotypic characteristic of interest is 
5 susceptibility or resistance to effects of uptake of food or drink. 

9. The method of claim 8, wherein the drink is an alcoholic drink. 

10. The method of claim 6, wherein the phenotypic characteristic of interest is 
10 susceptibility to, resistance to, or adverse effects of a therapy using a therapeutic agent or a 

medical device. 

1 1 . The method of claim 6, wherein phenotypic characteristic of interest is selected from 
the group consisting of cancer, hematological disorders, autoimmune diseases, inflammatory 

15 diseases, cardiovascular diseases, liver diseases, neurodegenerative diseases, diabetes, kidney 
disorder, gastrointestinal disorders, pain, bacterial infection, parasitic infection, viral 
infection, .and a specific stage of development thereof. 

12. The method of claim 6, wherein the number of individuals in the case group, n, and 
20 the number of the individuals in the control group, m, are each larger than 5. 

13. The method of claim 6, wherein the number of individuals in the case group, n, and 
the number of the individuals in the control group, m, are each larger than 1 00. 

25 14. The method of claim 6, wherein the number of individuals in the case group, n, and 
the number of the individuals in the control group, m, each independently varies between 10- 
100,000. 

15. The method of claim 6, wherein the interrogation position is a SNP position. 

30 

16. The method of claim 15, wherein the genetic region containing the SNP position was 
not previously known to be associated with the phenotypic characteristic of interest. 
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1 7. The method of claim 1 , wherein the first and second samples are each collected by 
pooling biological samples fi"om the first and second group of individuals, respectively. 

1 8. The method of claim 1 7, wherein the biological samples are pooled by combining a 
5 substantially equal amount of biological sample fi-om each individual in each group. 

19. The method of claim 17, wherein the biological samples are selected fi-om the group 
consisting of genomic DNA, mitochondrial DNA, extragenomic DNA, cDNA, and RNA. 

10 20. The method of claim 17, wherein the biological samples fi-om the first and second 
groups are labeled with a detectable marker. 

21. The method of claim 20, wherein the detectable marker is selected fi-om the group 
consisting of cychrome, fluorescein, Alexa-488, radioisotopes, and biotin. 

15 . . 

22. The method of claim 1, wherein the first sample is collected by pooling genetic 
material fi-om the first group of individuals and amplifying the pooled genetic materials in the 
first group and the second sample is collected by pooling genetic material fi-om the second 
group of individuals and amplifying the pooled genetic materials in the second group. 

20 

23. The method of claim 22, wherein amplicons of genetic material fi-om the first group is 
labeled with a different detectable marker than are amplicons of genetic material from the 
second group. 

25 24. The method of claim 23, wherein amplicons of genetic materials fi-om the first group 
are each labeled with biotin; and amplicons of genetic materials fi"om the second group are 
each labeled with fluorescein. 

25. The method of claim 23, wherein the first sample and second sample are mixed after 
30 being labeled, followed by hybridization to an array and subsequent staining, wherein said 

first sample is stained with different stain than said second sample. 

26. The method of claim 1, wherein the first sample is collected by amplifying genetic 
material from each individual in the first group of individuals and then pooling the amplified 
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genetic materials from all of the individuals in the first group, and the second sample is 
collected by amplifying genetic material from each individual in the second group of 
individuals and then pooling the amplified genetic materials from all of the individuals in the 
second group. 

5 

27. The method of claim 1, wherein the interrogation position contains a biallelic 
polymorphism. 

10 28. The method of claim 1, wherein the first measure of relative allele frequency is 
derived from a first measure of intensity of signals from a first probe, set on a first 
oligonucleotide array; and the second measure of relative allele frequency is derived from a 
second measure of intensity of signals from a second probe set on the first or a second 
oligonucleotide array. 

15 

29. The method of claim 28, wherein the density of the oligonucleotide array is at least 
100 probes per square centimeters. 

30. The method of claim 28, wherein the density of the oligonucleotide array is at least 
20 1 000 probes per square centimeters. 

31. The method of claim 28, wherein the density of the oligonucleotide array is between 
100-100,000,000 probes per square centimeters. 

25 32. The method of claim 28, wherein the density of the oligonucleotide array is between 
1,000,000-80,000,000 probes per square centimeters. 

33. The method of claim 28, wherein each of the oligonucleotides on the array is 10-100 
nucleotides in length. 

30 

34. The method of claim 28, wherein each of the oligonucleotides on the array is 20-50 
nucleotides in length, 

35. The method of claim 28, further comprising: 
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correcting the first and second measures of intensity. 

36. The method of claim 35, wherein the step of correcting includes subtracting a 
background intensity. 

5 

37. The method of claim 36, wherein the background intensity is the intensity of a probe 
cell having the 1000*^ lowest intensity on the oligonucleotide array. 

38. The method of claim 36, wherein the background intensity is determined by 
10 calculating an equation of the form: 

</r>+</r> 

2 

39. The method of claim 28, further comprising: 

evaluating the first and second measures of intensity to determine whether the first 
15 probe set and the second probe set have each detected the nucleotide sequences that they 
were designed to detect. 

40. The method of claim 39, wherein the step of evaluating includes determining if the 
measure of intensity of a perfectly complementary probe in the first probe set is greater than 

20 any of the measures of intensity of mismatch probes in the first probe set, in which case the 
first probe set is a conforming probe set; and determining if the measure of intensity of a 
perfectly complementary probe in the second probe set is greater than any of the measures of 
intensity of mismatch probes in the second probe set, in which case the second probe set is a 
conforming probe set 

25 

41. The method of claim 28, wherein the first or second probe set is included in a probe 
tiling that comprises 

a reference tiling comprising a set of reference oligonucleotide probes with a varying 
nucleotide at the interrogation position and a set of complementary reference oligonucleotide 
30 probes that are each complementary to the corresponding reference oligonucleotide probes; 
and 
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an alternate tiling comprising a set of alternate oligonucleotide probes with a varying 
nucleotide at the interrogation position and a set of complementary alternate oligonucleotide 
probes that are each complementary to the corresponding alternate oligonucleotide probes. 

5 42. The method of claim 41, wherein the reference and alternate oligonucleotide probes 
differ from each other by a nucleotide at the interrogation position. 

43. The method of claim 41, wherein the varying nucleotide at the interrogation position 
is A, T, G, U or C. 
1 0 44. The method of claim 43, further comprising: 

calculating a number of conforming probe sets for the reference tiling; 

calculating a total number of probe sets in the reference tiling; 

calculating a conformance^ value for the reference tiling by dividing the number of 
conforming probe sets with the total number of probe sets in the reference tiling; 
15 calculating a number of conforming probe sets for the alternate tiling; 

calculating a total number of probe sets in the alternate tiling; and 

calculating a conformance value for the alternate tiling by dividing the number of 
conforming probe sets with the total number of probe sets in the alternate tiling. 

20 45. The method of claim 44, further comprising: 

discarding any measure of intensity obtained from the tiling that has the conformance 
value lower than 0.6. 

46. The method of claim 44, ftirther comprising: 

25 discarding any measure of intensity obtained from the tiling that has the confomiance 

value lower than 0.9. 

47. The method of claim 28, wherein said first probe set and said second probe set are the 
same probe set. 

30 " 

48. The method of claim 1, wherein the relative allele frequency is based on variation of a 
nucleotide at the interrogation position of the nucleic acid segment relative to a reference 
nucleic acid segment, the nucleotide at said interrogation position in said reference nucleic 
acid segment being designated a reference allele and a nucleotide at said interrogation 
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position that differs from said reference allele being designated an alternate allele; and the 
measure of relative allele frequency is the proportion of either said reference allele or said 
alternate allele at the interrogation position. 

5 49. The method of claim 48, wherein the relative allele frequency is determined by 
calculating an equation of the form: 

C,y(Ca + Cr) or 
Ca/(Ca + Cr), 

where Cr is the concentration of the reference allele and Cg is the concentration of the alternate 
10 allele. 

50. The method of claim 48, wherein the measure of relative allele frequency is 
determined By calculating an equation of the form: 

lR/(lA + lR)or 

15 Ia/(Ia + ]rX 

where 1r is an intensity of signal from the reference allele, and U is an intensity of signal 
from the alternate allele. 

51. The method of claim 48, wherein the measure of relative allele frequency is 
20 determined using measurements of hybridization to perfect match probes, 

52. The method of claim 48, wherein each measure of relative allele frequency is 
determined using at least two intensity of signal measurements by calculating at least one of ~ 
the equations of the form: 

25 <Ir>/(<Ia> + <1r>) and 

<Ia>/(<Ia> + <Ir>), 

where <Ir> is an average of intensities of signal measurements from the reference allele, and 
<Ia> is an average of intensities of signal measurements from the alternate allele. 

30 53. The method of claim 52, wherein the averages of intensities of signal measurements 
are arithmetic means of intensities of signal measurements. 

54. The method of claim 52, wherein the averages of intensities of signal measurements 
are trimmed means of intensities of signal measurements. 
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55. The method of claim 48, wherein each measure of relative allele frequency is 
determined using at least two intensity of signal measurements by calculating at least one of 
the equations of the form: 

5 <Ir/(Ia + Ir)>= ^ ^ ^ ' and 

<1a/(1a + 1r)> = ^ ' , 

n 

where Ir is an intensity of signal measurement from the reference allele, U is an intensity of 
signal measurement from the alternate allele, < Ir/(Ia + Ir)> and <1a/(1a + Ir)> are averages 
of the ratios of intensities of signal measurements, and w is a number of offsets at which the 
10 intensity of signal measurements were measured. 

56. The method of claim 55, wherein the averages of the ratios of intensities of signal 
measurements are arithmetic means of the ratios of intensities of signal measurements. 

15 57. The method of claim 55, averages of the ratios of intensities of signal measurements 
are trimmed means of the ratios of intensities of signal measurements, 

58. The method of claim 48, wherein wherein each measure of relative allele frequency is 
determined using at least one intensity of signal measurement that has been corrected. 

20 

59. The method of claim 58, wherein the at least one intensity of signal measurement is 
corrected by subtracting background. 

60. The method of claim 59, wherein the background is calculated at least from intensity 
25 of signal measurements from mismatch probes. 

61. The method of claim 60, wherein the background is calculated from a mean of the 
intensity of signal measurements from mismatch probes. 
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62. The method of claim 61 , wherein the mean of intensities of signal measurements from 
mismatch probes is an arithmetic mean. 

63. The method of claim 61, wherein the mean of intensities of signal measurements 
5 from mismatch probes is a trimmed mean. 

64. The method of claim 1, wherein the step of analyzing the first measure and the second 
measure includes determining a difference between the first measure and the second measure. 

10 65. The method of claim 64, further comprising: 

. determining if the difference falls within a predetermined percentile of a distribution 
of differences between first measures and second measures. 

66. The method of claim 65, wherein the percentile of distribution is top 20%. 

67. The method of claim 65, wherein the percentile of distribution is top 5%. 



68. The method of claim 1 , further comprising: 

inputting into the computer system a further first measure of relative allele frequency 
20 that is obtained by repeating the measurement of the first measure of relative allele 
frequency; and 

inputting into the computer system a further second measure of relative allele 
frequency that is obtained by repeating the measurement of the second measure of relative 
allele frequency, wherein the step of analyzing includes determining, based on the variation 
25 in the first measure and further first measure and the variation in the second measure and 
further second measure, whether the first, further first, second and further second measures of 
relative allele frequency are suitable for use in characterizing the interrogation position. 

69. The method of claim 6, wherein the interrogation position is a SNP position at 
30 position i within a haplotype block having N different haplotype patterns, and each of the first 

and second measures of relative allele frequency is corrected by applying a first equation: 
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where P, is a corrected relative allele frequency of the interrogation position; N is the total 
number of different haplotype patterns within the haplotype block; /w,y is a coefficient having 
a value of +1 if the allele at position / matches a reference allele of the SNP and having a 
value of 0 if the allele at position / matches an alternate allele of the SNP; and fj is a 
5 haplotype pattern frequency. 

70. The method of claim 69, wherein the first equation is constrained by a second 
equation: 

7 1 . The method of claim 70, wherein the fj has a value ranging from 0 to 1 . 

72. The method of claim 1, further comprising: 

15 calculating a difference between the first measure of relative allele frequency and the 

second measure of relative allele frequency. 

73. The method of claim 72, wherein the interrogation position is a SNP position at 
position i within a haplotype block having N different haplotype patterns, and the difference 

20 between the first measure and second measure of relative allele frequency is corrected by 
applying a third equation: 

where APi is a corrected relative allele frequency difference of the interrogation position; 
25 is the total nimiber of different haplotype patterns within the haplotype block; w// is a 
coefficient having a value of +0.5 if the allele at position / matches a reference allele of the 
SNP and having a value of -0.5 if the allele at position / matches an alternate allele of the 
SNP; and A/J is .a haplotype pattern frequency difference, 

30 74. The method of claim 73, wherein the third equation is constrained by a fourth 
equation: 
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IS. The method of claim 6, wherein the step of inputting the first measure includes 
inputting a plurality of first measures, and the step of inputting the second measure includes 
5 inputting a plurality of second measures. 

76. The method of claim 75, wherein the plurality of first measures and the plurality of 
second measures correspond to duplicated or replicated measurements on the same 
interrogation position in the nucleic acid segment. 

10 

77. The method of claim I, wherein said interrogation position is a plurality of 
interrogation positions in the same or different nucleic acid segments, and for each 
interrogation position a said first measure of allele frequency and a said second measure of 
allele frequency are inputted into said computer system and analyzed to characterize said 

1 5 each interrogation position. 

78. The method of claim 75, wherein the step of analyzing includes pairing a measure 
from the plurality of first measures and a measure from the plurality of the second measures 
based on common experimental conditions. 

20 

79. The method of claim 75, wherein the step of analyzing includes analyzing using a set 
of differences of paired measures using a method selected from the group comprising: a 
paired t-test; calculating an Olympic average; determining the median value; and all members 
of the set having the same sign. 

25 

80. The method of claim 75, where the step of analyzing includes calculating the mean of 
the plurality of first measures and the mean of the plurality of second measures. 

8 1 . The method of claim 80, further comprising: 

30 analyzing the absolute difference between the mean of the plurality of first measures 

and the mean of the plurality of second measures by thresholding the difference between the 
means, wherein if the absolute difference between the mean of the plurality of first measures 
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and the mean of the plurality of second measures is equal to or above the threshold value, the 
interrogation position is characterized as being associated with the phenotypic characteristic 
of interest. 

5 82. The method of claim 8 1 , wherein the threshold value is 0.04. 

83. The method of claim 81, wherein the threshold value is 0.1. 

84. The method of claim 8 1 , further comprising: 
inputting a further plurality of first measures and a further plurality of second 

measures; 

calculating a mean of the further plurality of first measures and a mean of the further 
plurality of second measures; 

analyzing the difference between the mean of the further plurality of first measures 
and the mean of the further plurality of second measures by using a further threshold value. 

85. The method of claim 84, wherein the threshold value is 0.05 and the further threshold 
value is 0.10, wherein if the absolute difference between the mean of the further plurality of 
first measures and the mean of the further plurality of second measures is equal to or above 

20 the further threshold value, the interrogation position is characterized as being associated 
with the phenotypic characteristic of interest. 

86. The method of claim 75, wherein the step of analyzing includes: 

calculating the standard deviation of the plurality of first measures and the standard 
25 deviation of the plurality of second measures; and 

analyzing each of the standard deviations of the first and second pluralities of 
measures. 

87. The method of claim 86, wherein each of the standard deviations of the first and 
30 second pluralities of measures is analyzed by using a chi-squared distribution to determine at 

least one cutoff value for the standard deviations. 

88. The method of claim 87, fiirther comprising: 



10 
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discarding the plurality of the first or second measures if the calculated standard 
deviation of the plurality of the first or second measures is greater than the cutoff value. 

89. The method of claim 75, further comprising: 
5 calculating an arithmetic mean and standard deviation of the plurality of the first 

measures, and an arithmetic mean and standard deviation of the plurality of second measures, 
and 

applying a t-test using said arithmetic means and standard deviations to determine 
whether it is likely that the plurality of first measures and the plurality of second measures are 
10 from the same or different distributions, where if the plurality of first measures and the 
plurality of second measures are found likely to be fi-om different distributions, the 
interrogation position in the nucleic acid segment is characterized as being associated with 
the phenotypic characteristic of interest. 

1 5 90. The method of claim 75, further comprising: 

calculating an arithmetic mean and standard deviation for the plurality of the first 
measures, and an arithmetic mean and standard deviation for the plurality of second 
measures, 

determining a distribution for the plurality of first measures based on said arithmetic 
20 mean and standard deviation of the plurality of first measures, 

determining a distribution for the plurality of second measures based on said 
arithmetic mean and standard deviation of the plurality of second measures, and 

determining a difference between said distribution for the plurality of first measures 
and said distribution for the plurality of second measures, wherein if the difference between 
25 said distribution of the plurality of first measures and said distribution of the plurality of 
second measures is significant, either positive or negative, the interrogation position in the 
nucleic acid segment is characterized as being associated with the phenotypic characteristic 
of interest. 

30 . 91 . The method of claim 89, wherein the likelihood that the plurality of first measures and 
the plurality of second measures are fi-om different distributions is assessed by using a 
formula of the form: 
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^ < p\> - < p\ > 

where t is the t-statistic in the t-test analysis; <P'i> and <P'2> are arithmetic means of the 
plurahties of first and second measures, respectively; a i and a 2 are standard deviations of 
the pluralities of first and second measures, respectively; and Ni and N2 are numbers of 
5 members of the pluralities of first and second measures, respectively. 

92. The method of claim 89, further comprising: 

comparing the t-statistic with a Student t-distribution using a number of degrees of 
fi*eedom, which is calculated by using an equation of the form (Ni + N2 -2), to obtain a p- 
10 value, where if the p-value is lower than 0.05, the interrogation position in the nucleic acid 
segment is characterized as being associated with the phenotypic characteristic of interest. 



93. The method of claim 92, further comprising: 

repeating the same steps for characterization of the interrogation position in the 
15 nucleic acid segment on one or more fiirther interrogation positions in the same or different 
nucleic acid segment; 

discarding the interrogation positions having p-values higher than a cutoff value, and 
characterizing the remaining interrogation position(s) as being associated with the phenotypic 
characteristic of interest. 

20 

94. The method of claim 75, further comprising: 

calculating absolute values of the plurality of the first measures and calculating 
absolute values of the plurality of the second measures; 

ranking the absolute values of the plurality of the first measures; ranking the absolute 
25 values of the plurality of the second measures; and 

determining the rank-sum distribution for both the plurality of the first measures and 
the plurality of the second measures. 

95. The method of claim 94, wherein the rank-sum distribution is determined by using a 
30 Wilcoxson rank-sum test. 

96. The method of claim 95, further comprising: 
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determining the p-value for the interrogation position, wherein a confidence level 
higher than a selected value indicates that the interrogation position is a SNP position 
associated with the phenotypic characteristic of interest. 

5 97. The method of claim 95, further comprising: 

determining the p-value for the interrogation position, wherein a confidence level 
higher than 99% indicates that the interrogation position is a SNP position associated with the 
phenotypic characteristic of interest. 

10 98. The method of claim 6, wherein said an interrogation position is at least two different 
interrogation positions and the step of inputting the first measure includes i) inputting a 
plurality of first measures corresponding to the different interrogation positions, and the step 
of inputting the second measure includes ii) inputting a plurality of second measures 
corresponding to the same interrogation positions as in step i). 

15 

99. The method of claim 98, further comprising: 

iii) repeating step i) to obtain a further plurality of first measures; and 

iv) repeating step ii) to obtain a further plurality of second measures. 

20 100. The method of claim 99, further comprising: 

calculating the difference between the plurality of the first measures and the plurality 
of the second measures for each of the different interrogation positions; 

identifying the interrogation positions having a difference above a first threshold 
value, and the interrogation positions having a difference below a second threshold value 
25 between the plurality of the first measures and the plurality of the second measures; 

calculating the difference between the further plurality of the first measures and the 
fiirther plurality of the second measures for each of the different interrogation positions; 

identifying the interrogation positions having a difference above the first threshold 
value, and the interrogation positions having a difference below the second threshold value 
30 between the further plurality of the first measures and the fiirther plurality of the second 
measures, 
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characterizing those interrogation positions that are identified as having a difference 
above the first threshold value in both the plurality and further plurality of measures as being 
associated with the phenotypic characteristic of interest, and 

characterizing those interrogation positions that are identified as having a difference 
5 below the second threshold value in both the plurality and further plurality of measures as 
being associated with the phenotypic characteristic of interest. 

101. The method of claim 100, wherein the first threshold value separates the top 20% in 
the distribution of the calculated differences from the bottom 95%, and the second threshold 

10 value separates the bottom 20% in the distribution of the calculated differences fi-om the top 
95%, wherein those interrogation positions that have a difference in the top 20% or the 
bottom 20% of the distribution of calculated differences are characterized as associated. 

102. The method of claim 100, wherein the first threshold value separates the top 5% in the 
15 distribution of the calculated differences from the bottom 95%, and the second threshold 

value separates the bottom 5% in the distribution of the calculated differences from the top 
95%, wherein those interrogation positions that have a difference in the top 5% or the bottom 
5% of the distribution of calculated differences are characterized as associated. 

20 

103. The method of claim 6, further comprising: 

validating the characterization of the interrogation position. 

104. The method of claim 103, wherein the step of validating includes identifying a 
25 location of the nucleic acid segment in the human genome. 

105. The method of claim 104, wherein if the nucleic acid segment is located in a coding 
or regulatory region of a gene and the interrogation position was designated as associated 
with the phenotypic characteristic of interest, then the gene is deemed to be associated with 

30 the phenotypic characteristic of interest and the step of validating further includes cloning 
and expressing the associated gene to produce a protein product and characterizing the 
protein product. 
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106. The method of claim 104, wherein if the nucleic acid segment is located in a coding 
or regulatory region of a gene, and the interrogation position was designated as associated 
with the phenotypic characteristic of interest, then the gene is deemed to be associated with 
the phenotypic characteristic of interest and the step of validating further includes regulating 

5 expression of the associated gene in cells in vitro or in vivo and detecting changes of cells in 
response to the regulation. 

107. The method of claim 104, wherein if the nucleic acid segment is located in a coding 
or regulatory region of a gene and the interrogation position was designated as associated 

10 with the phenotypic characteristic of interest, then the gene is deemed to be associated with 
the phenotypic characteristic of interest and the step of validating further includes: = * 

screening a library of pharmaceutical candidates against the associated gene or gene 
product; and 

selecting the pharmaceutical candidates that modulate expression of the associated 
1 5 gene or activity of the associated gene product. 

108. A method for determining a relative allele frequency for an interrogation position in a 
nucleic acid segment based on nucleotide variation at the interrogation position of the nucleic 
acid segment relative to a reference nucleic acid segment and the nucleic acid segment with 

20 nucleotide variation being designated as an alternate nucleic acid segment, comprising: 

determining a plurality of intensities of signals from the reference nucleic acid 
segment, which are designated as iRj-i wherein i is an integer equal to or larger than 2; 

determining a plurality of intensities of signals from the alternate nucleic acid 
segment, which are designated as U^i.j wherein j is an integer equal to or larger than 2; 
25 determining the relative allele frequency of an interrogation position by calculating an 

equation of the form: 

<lR,,.i>/(< lAj-j> + <]R.i.i>) or 

<lAJ.j>/(<lA,l-j> + <lRJ-i>), 

where <lRj-j> is the average of the plurality of intensities of signals from the reference 
30 nucleic acid segment, and < Uj-j > is the average of intensities of signals from the alternate 
nucleic acid segment. 

109. The method of claim 108, wherein the signals result at least from perfect match 
probes on an oligonucleotide array. 
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1 10. The method of claim 108, wherein the average of intensities of signals is an arithmetic 
mean of intensities of signals. 



5 111. The method of claim 108, wherein the average of intensities of signals is a trimmed 
mean of intensities of signals. 

1 12. The method of claim 108, wherein the average of the plurality of intensities of signals 
from the reference nucleic acid segment, <lRj-i>, excludes outlier signals; and the average of 

10 the plurality of intensities of signals from the alternate nucleic acid segment, <lA,i-j>, 
excludes outlier signals. 

113. The method of claim 108, wherein at least one of <lR,i.i> or <lA,i -j> is corrected. 

15 114. The method of claim 113, wherein at least one of <Ir,ih> or <lAj-j> is corrected for 
background. 

1 15. The method of claim 1 14, wherein at least one of <lR,i-i> and <lAj-j> are corrected for 
background and the relative allele frequency of an interrogation position is determined by 
20 calculating an equation of the form: 



(<;^m>^<zr>+<zr>. 



(</y.>-<.'r>-K/r>)^(^^g.^_</r>-K/r>; 



wherein <1r''"'> is the average of the plurality of intensities of signals from the reference 
25 nucleic acid segment generated by perfect match probes on an oligonucleotide array; <1a^'"> 
is the average of the plurality of intensities of signals from the alternate nucleic acid segment 
generated by perfect match probes on the oligonucleotide array; <Ir"^"> is the average of the 
plurality of intensities of signals from the reference nucleic acid segment generated by 
mismatch probes on the oligonucleotide array; and <Ia'""^> is the average of the plurality of 
30 intensities of signals from the alternate nucleic acid segment generated by mismatch probes 
on the oligonucleotide array. 
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1 16. The method of claim 115, wherein the average of intensities of signals is an arithmetic 
mean of intensities of signals. 

5 117. The method of claim 115, wherein the average of intensities of signals is a trimmed 
mean of intensities of signals. 

118. A computer-implemented method for characterizing a polymorphic marker in a 
nucleic acid, comprising: j 

inputting into a computer system a first measure of relative allele frequency for the 
10 polymorphic marker in a first sample, said first sample containing nucleic acids from a first 
group of n individuals, wherein n is an integer equal to or larger than 2; 

inputting into a computer system a second measure of relative allele fi-equency for the 
polymorphic marker in a second sample, said second sample containing nucleic acids from a 
second group of m individuals, wherein m is an integer equal to or larger than 2; and 
15 analyzing in the computer system the first measure and the second measure to 

characterize the polymorphic marker as being associated with a phenotypic trait of interest. 

119. A computer-implemented method for characterizing an interrogation position in a 
nucleic acid segment, comprising: X 

20 inputting into a computer system a group of first measures of hybridization probe 

intensities corresponding to the interrogation position in the nucleic acid segment derived 
from a first sample collected from a case group of n individuals, wherein n is an integer equal 
to or larger than 2; 

calculating in the computer system a group of first relative allele frequencies at the 
25 interrogation position for the case group based on the group of first measures of hybridization 
probe intensities; 

inputting into the computer system a group of second measures of hybridization probe 
intensities corresponding to the interrogation position in the nucleic acid segment derived 
from a second sample collected from a control group of m individuals, wherein m is an 
30 integer equal to or larger than 2; 

calculating in the computer system a group of second relative allele frequencies at the 
interrogation position for the control group based on the group of second measures of 
hybridization probe intensities; 
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analyzing in the computer system the group of the first relative allele frequencies and 
the group of second relative allele frequencies to characterize the interrogation position as 
being associated with a phenotypic characteristic of interest. 

120. The method of claim 119, wherein the group of the first measures of hybridization 
probe intensities are obtained by at least 2 sets of repetitive experiments, and the group of the 
second measures of hybridization probe intensities are obtained by at least 2 sets of repetitive 
experiments. 

121 . The method of claim 1 1 9, wherein the step of analyzing includes: 
calculating a mean of the group of the first relative allele frequencies; 
calculating a mean of the group of the second relative allele frequencies; and 
calculating the absolute difference between the mean of the group of the first relative 

allele frequencies and the mean of the group of the second relative allele frequencies, 
where if the absolute difference is equal to or above a predetermined threshold value, the 
interrogation position is characterized as being associated with the phenotypic characteristic 
of interest. 

1 22. The method of claim 121, wherein the threshold value ranges from 0.02 to 0.2. 

1 23. The method of claim 121, wherein the threshold value ranges from 0.04 to 0. 1 . 

124. The method of claim 119, wherein the step of analyzing includes applying a statistical 
test to the group of the first relative allele frequencies and the group of second relative allele 
frequencies to characterize the interrogation position as being associated with a phenotypic 
characteristic of interest. 

125. The method of claim 124, wherein the statistical test is a t-test or rank-sum test. 

126. The method of claim 119 wherein said an interrogation position is at least two 
interrogation positions, and further comprising: 

calculating the difference between the group of the first measures and the group of the 
second measures for each of the different interrogation positions; 
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characterizing the interrogation positions having a difference above a first threshold 
value, and interrogation positions having a difTerence below a second threshold value 
between the group of the first measures and the group of the second measures as associated 
with the phenotypic characteristic of interest. 

5 • 

127. A computer-implemented method for characterizing an interrogation position in a 
nucleic acid segment with a previously known location in the human genome, comprising: l 

inputting into a computer system a first measure of relative allele frequency at the 
interrogation position in the nucleic acid segment derived from a first sample collected from a 
10 first group of n individuals, wherein n is an integer equal to or larger than 2; 

inputting into the computer system a second measure of relative allele frequency at 
the interrogation position in the nucleic acid segment derived from a second sample collected 
from a second group of m individuals, wherein m is an integer equal to or larger than 2; and 
analyzing in the computer system the first measure and the second measure to 
1 5 characterize the interrogation position. 

128. The method of claim 127, wherein analyzing includes analyzing in the computer 
system the first measure and the second measure to characterize the interrogation position as 
being associated with a phenotypic characteristic of interest. 

20 

129. The method of claim 128, wherein the nucleotide segment is proximal to or within a 
region of a candidate gene that is not previously known to be associated the phenotypic 
characteristic of interest. 

25 130. The method of claim 128, wherein the nucleotide segment is proximal to or within a 
region of a candidate gene that is previously suspected of being associated the phenotypic 
characteristic of interest. 

131. The method of claim 128, wherein the nucleotide segment is proximal to or within an 
30 untranslated region of a candidate gene that is not previously known to be associated the 
phenotypic characteristic of interest. 
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132. The method of claim 128, wherein the nucleotide segment is proximal to or within an 
untranslated region of a candidate gene that is previously suspected of being associated the 
phenotypic characteristic of interest. 

133. Data processing apparatus for characterizing an interrogation position in a nucleic 
acid segment, comprising: y 

a data processor; 

a storage device holding computer readable code in communication with the data 
processor, the computer readable code including: 

computer code which determines a first measure of relative allele frequency at 
. the interrogation position in the nucleic acid segment derived from a first sample 
collected from a first group of n individuals, wherein n is an integer equal to or larger 
than 2; 

computer code which determines a second measure of relative allele frequency 
at the interrogation position in the nucleic acid segment derived from a second sample 
collected from a second group of m individuals, wherein m is an integer equal to or 
larger than 2; and 

computer code which analyzes the first measure and the second measure to 
characterize the interrogation position. 

134. The data processing apparatus of claim 133, wherein analyzing includes analyzing in 
the first measure and the second measure to characterize the interrogation position as being 
associated with a phenotypic characteristic of interest. 

25 135. The data processing apparatus of claim 134, wherein the data processing apparatus is 
further in communication with another data storage device which stores a first measure of the 
intensity of signals from a first probe set on an oligonucleotide array and a second measure of 
the intensity of signals from a second probe set on the oligonucleotide array, wherein the first 
measure of relative allele frequency is determined based on the first measure of intensity and 

30 the second measure of relative allele frequency is determined based on the second measure of 
intensity. 

136. The data processing apparatus of claim 135, wherein the data processing apparatus is 
further in communication with an imaging device. 
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137. The data processing apparatus of claim 136, wherein the imaging device is a scanner 
that determines the first measure of intensity and the second measure of intensity. 



138. A computer readable medium holding computer readable code for characterizing a 
position in a nucleic acid segment and for carrying out the processes of: 



determining a first measure of relative allele frequency at the interrogation position in 
the nucleic acid segment derived from a first sample collected from a first group of n 
individuals, wherein n is an integer equal to or larger than 2; 

determining a second measure of relative allele frequency at the interrogation position 
in the nucleic acid segment derived from a second sample collected from a second group of m 
individuals, wherein m is an integer equal to or larger than 2; and 

analyzing the first measure and the second measure to characterize the interrogation 
position. 

139. The computer readable medium of claim 138, wherein analyzing includes analyzing 
the first measure and the second measure to characterize the interrogation position as being 
associated with a phenotypic characteristic of interest. 
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